Duplicate code

Duplicate code is a computer programming term for a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Duplicate code is generally considered undesirable for a number of reasons.[1] A minimum requirement is usually applied to the quantity of code that must appear in a sequence for it to be considered duplicate rather than coincidentally similar. Sequences of duplicate code are sometimes known as clones.

The following are some of the ways in which two code sequences can be duplicates of each other:

Contents

How duplicates are created

There are a number of reasons why duplicate code may be created, including:

Problems associated with duplicate code

Code duplication is generally considered a mark of poor or lazy programming style. Good coding style is generally associated with code reuse. It may be slightly faster to develop by duplicating code, because the developer need not concern himself with how the code is already used or how it may be used in the future. The difficulty is that original development is only a small fraction of a product's life cycle, and with code duplication the maintenance costs are much higher. Some of the specific problems include:

Detecting duplicate code

A number of different algorithms have been proposed to detect duplicate code. For example:

Example of functionally duplicate code

Consider the following code snippet for calculating the average of an array of integers

extern int array1[];
extern int array2[];
 
int sum1 = 0;
int sum2 = 0;
int average1 = 0;
int average2 = 0;
 
for (int i = 0; i < 4; i++)
{
   sum1 += array1[i];
}
average1 = sum1/4;
 
for (int i = 0; i < 4; i++)
{
   sum2 += array2[i];
}
average2 = sum2/4;

The two loops can be rewritten as the single function:

int calcAverage (int* Array_of_4)
{
   int sum = 0;
   for (int i = 0; i < 4; i++)
   {
       sum += Array_of_4[i];
   }
   return sum/4;
}

Using the above function will give source code that has no loop duplication:

extern int array1[];
extern int array2[];
 
int average1 = calcAverage(array1);
int average2 = calcAverage(array2);

Tools

Code duplication analysis tools include:

See also

References

  1. ^ Spinellis, Diomidis. "The Bad Code Spotter's Guide". InformIT.com. http://www.informit.com/articles/article.aspx?p=457502&seqNum=5. Retrieved 2008-06-06. 
  2. ^ Brenda S. Baker. A Program for Identifying Duplicated Code. Computing Science and Statistics, 24:49–57, 1992.
  3. ^ Ira D. Baxter, et al. Clone Detection Using Abstract Syntax Trees
  4. ^ Visual Detection of Duplicated Code by Matthias Rieger, Stephane Ducasse.
  5. ^ A Workbench for Clone Detection Research by E. Juergens, F. Deissenboeck, B. Hummel
  6. ^ Do Code Clones Matter? by E. Juergens, F. Deissenboeck, B. Hummel, S. Wagner
  7. ^ CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code. by Zhenmin Li, Shan Lu, Suvda Myagmar and Yuanyuan Zhou.

External links